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Abstract 



Optimal prediction (OP) methods compensate for a lack of resolution in the nu- 
merical solution of complex problems through the use of an invariant measure as a 
prior measure in the Bayesian sense. In first-order OP, unresolved information is ap- 
proximated by its conditional expectation with respect to the invariant measure. In 
higher-order OP, unresolved information is approximated by a stochastic estimator, 
leading to a system of random or stochastic differential equations. 

We explain the ideas through a simple example, and then apply them to the solution 
of Averaged Euler equations in two space dimensions. 
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1 Introduction 



Many problems in mechanics, in particular problems involving turbulence, cannot be 
properly resolved because the number of significant degrees of freedom is too large. 
The problem of making numerical predictions about the behavior of systems that have 
not been properly resolved has been addressed in ^, ^ theoretical results can 
be found in ||9|, |l^; a general introduction to such methods can be found in Q. When 
a system is under resolved, nothing much can be said without additional information; 
in the papers just quoted, it is assumed that the additional information consists of an 
invariant measure on the space of solutions; this gives rise to an optimal Markovian, 
deterministic, approximation. An invariant measure constitutes additional information 
because everything not explicitly known is assumed to be distributed according to the 
invariant measure; it functions like a prior measure in Bayesian statistics |jl|. Unlike 
what happens in other areas of application of Bayesian statistics, nature often provides 
a rational choice of prior, invariant measure in the form of a canonical measure. The 
use of an invariant measure gives rise to approximations that are optimal in a sense 
that we shall specify below. 

In many problems this optimal approximation is still not accurate enough, and 
a higher-order, stochastic, approximation can be derived. In the present paper we 
explain these constructions with the help of a simple example, and then apply them 
to the solution of Averaged Euler equations in two space dimensions. The main diffi- 
culty in higher-order prediction lies in finding estimators for stochastic processes whose 
temporal correlations are determined only empirically. 



2 Optimal and stochastic prediction for Hamil- 
tonian systems 

We present properties associated with general, even infinite dimensional Hamiltonian 
systems, in the simple case of two oscillators, with position variables (?2 and momenta 
Pi,P2, and the Hamiltonian: 

H = H{q,p) = ^(^qj + qi+pl+pl+ pjpl) (1) 

(the "Hald system"). The equations of the motion of the system are: 

-|L = Pl+PlPi, 
^ = -01 

dq2 , 2 V^J 

dp2 _ 

We pretend that 4 equations in 4 unknowns are too difficult to solve on available 
computers but that 2 equations in 2 unknowns are accessible (A more realistic situa- 
tion is one where one has to solve, say, 10^'^ equations and one can afford only 10^). 
Alternately, suppose that for some reason at time t = we only have values for qi,pi 
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but not for the two other variables. The question is, how does one write equations for 
qi,Pi without computing q2,P2- 

In a standard, Galerkin, approach, one simply sets all the uncomputed variables to 
zero; this results in the system: 

dqi ^ dpi ^ 
dt ^ dt 

which is not a very good approximation. 

Suppose however that although the initial conditions for oscillator 2 are unknown, 
we do know that they are drawn from the canonical distribution: 

P{xi < qi < xi + dxi,X2 < q2 < X2 + dx2, 
yi<Pi<yi + dyi,y2 <P2<y2 + dy2) = 

Z^^ ex.p{—H{x,y)/T)dxdy, (3) 

where P is the probability of the event in parentheses, Z is a normalization constant 
that ensures that the sum of all probabilities is 1, dxdy = dxidx2dyidy2, T is a pa- 
rameter that controls the variance of the samples and is known for physical reasons as 
the temperature, and H{x,y) is the Hamiltonian function (jl]) with qi replaced by xi, 
pi replaced by yi, etc. Equation (|3|) is often written in the shorter symbolic form 

P{q,p) = Z-'eM-H{q,p)/T. (4) 

One can readily check that this probability distribution is invariant under the flow 
defined by (|2|), i.e., if the initial data are distributed as in (^), then the solutions 
9i,2;Pi,2 have the same distribution at all later times. Nature likes this distribution, 
and reproduces it often (see any book on statistical mechanics). 

Suppose now that the missing initial conditions are drawn from the canonical dis- 
tribution (^) conditioned by the known information qi,pi, i.e., 

P{x2 <q2<X2 + dx2,y2 < P2 < 2/2 + dy2) = eyip{-Hg^^p-^{x2,y2))dx2dy2, (5) 

for some T, where Hqi,pi is H where the values of xi,yi have been given the fixed, 
known values of these initial conditions. In the language of Bayesian estimation the 
canonical distribution (Q) is a prior distribution (what we believe the distribution to 
be before we have any data), and the conditional distribution (^) is a posterior distri- 
bution (the prior distribution modified by what we know in a special case). Averages 
with respect to the conditional distribution (^) are conditional averages, and denoted 
by (the information after the vertical line is what we know, and the prior 

distribution is implied). We now approximate equations (^) by the optimal prediction 
(OP) equations: 

dqi ,-, r 2 1 1 n r 2 1 i dpi , > 

-j^ = E[pi+piP2\qi,pi\=pi+piE[p2\qi,pi\, = -^li' W 

(clearly = pi)- In our particular case, 

E[pl\qi,Pi] = E\pl\pi] = 
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J Jp2 exp ^ (p? +pl + ql+q2+ pIpI) ) c?g2dp2/ 
y" J exp(-(- • ■))dq2dp2; 



(the argument of the exponential is the same in the denominator as in the numerator) . 
After obvious cancellations, 



^IpIIpi] = J P2 exp(-?'i/2 - plpl/2)dp2/ J exp(- • ■)dp2 = 
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1+pi 

(a function of pi). A general theorem states that in the mean square with respect 
to the invariant canonical measure, -^[pllpi] is the best of approximation of P2 by 
a function of pi. The error in this instance of OP is always smaller than in the 



Galerkin approximation above, though in this instance not by much (see [10|). One 
should think of the system (^ as producing the average of all solutions obtained by 
having initially values of qi,pi and sampling the other variables from the conditioned 
canonical distribution. An important result due to Hald (Q) states that first-order 
OP for a Hamiltonian system also forms a Hamiltonian system, with a renormalized 
Hamiltonian which is minus the logarithm of the original Hamiltonian averaged over 
all the "missing" variables. 

First-order OP may be optimal in a mean square sense, but it may not be good 
enough in many situations, and we wish to do better. In particular, the mean solution 
of (^) decays, while the solution of the OP equations (|6|) does not. 

This dichotomy can be understood in several equivalent ways. Prom irreversible 
statistical mechanics we know that the canonical measure represents thermal equilib- 
rium and that the means of all quantities tend to their equilibrium values even when 
they are initially conditioned by partial information; the symmetry properties of the 
Hamiltonian (|l]) ensure that the asymptotic mean is zero. We will now present a second 
explanation of the decay of the mean which will motivate our approach to higher-order 
optimal prediction. Rewrite the OP equations in the form: 

dqi , ... dpi 

— =p^+p,z{t), ^ = -qi, (7) 

(where z{t) is of course p^)- equations (^) the random function z{t) is approximated 
by its conditional mean. However, in truth z{t) varies from realization to realization of 
the initial data, and we are averaging over systems in which z{t) has a mean value and 
a fluctuation around this mean value. If one thinks of each copy of the system, which 
conserves energy, as moving on some constant energy surface, the surfaces are slightly 
different for different copies of the system and the systems move on their surfaces 
at different rates. The constant energy surfaces are sphere-like, the means of these 
dispersed systems fall ever closer to the common center of these surfaces, which is the 
origin in qp space. To capture this effect we need to take into account the variability of 
z{t); the system (0) with the initial data qi,pi can be viewed as a random or stochastic 
differential equation; the problem is that we have yet to figure out what z{t) looks 
like, as a function of t and as a random variable (the randomness coming from the 
initial conditions). As we now explain, the existence of an invariant measure places 
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constraints on z{t) but also helps in modeling it. This modeling has to rely on the 
specific properties of the system under consideration; some of the elegant generality of 
first-order optimal prediction will be lost. We shall give below an example of how a 
term such as z{t) can be estimated. 

3 The Langevin equation and fluctuation / dis- 
sipation theorems 

Consider a single particle interacting with a thermal sea of other particles, the whole 
being presumably described in detail by some inaccessibly complicated Hamiltonian 
system. We wish to describe the evolution of the single particle's velocity without 
explicitly calculating the evolution of the sea of particles. The Langevin equation is a 
standard approximation of this system: 

du 

— = -^u + n{t), (8) 

where u = u{t) is the unknown particle velocity, n{t) is white noise with zero mean 
and 7 is a constant. The noise n{t) repesents the fluctuating force exerted by the 
sea of particles, while the first term on the right hand side represents the mean force 
exerted by the sea which opposes the motion of the particle. The Langevin equation 
is a standard example that shows how an invariant measure constrains the random 
forcing term in a stochastic or random differential equation. This equation can be 
solved by elementary means If one thinks of u as the velocity of a particle of 

mass 1, then one should require that asymptotically, as t ^ oo, the distribution of u 
converge to the canonical distribution with density exp(— m^/2T); this is achieved 
if 7 = 2/T; this is a "fluctuation/dissipation" result One can understand it 

as follows: suppose 7 = 0; then the variance of u increases with time t (indeed it 
is proportional to t). If there is no white noise, the initial variability of u decreases 
because the term in 7 is damping. When the fluctuation/dissipation relation holds, 
the damping and the fluctuations imposed by the white noise balance asymptotically 
so that the invariant distribution is reached. Application of the fluctuation/dissipation 
theorems allows the prediction of the magnitude and direction of the mean force from 
the invariant measure and the statistical properties of the random force. The formula 
connecting the damping 7 to the temperature is based on the assumption that the 
autocorrelation of the noise is a delta function, as it is in white noise; more general 
fluctuation/dissipation theorems 0, 14, |l^ will not be used in the present paper. 



4 The Averaged Euler equations 

As an example of the application of these ideas, we consider the Averaged Euler equa- 



tions [13, [T^] in two space dimensions. We choose these equations because an analysis 
of the three-dimensional Euler equations presupposes an extensive discussion of turbu- 
lence, while the statistical mechanics of the usual two-dimensional Euler equations leads 
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to negative temperature states and other unusual phenomena. The two-dimensional 
Averaged Euler equations describe certain temporal averages of the Euler equations as 
well as certain viscoelastic flows, and their statistical mechanics is compatible with the 
machinery we have described. 
We introduce the operator 

A = {l-a^Ay, 

where A is the Laplace operator, a is a real constant, and s is a positive number. If 
s is not an integer, j4 is a pseudo-differential operator. The Averaged Euler equations 
are: 

dAu 

+ (u • V)Au + (Vu)^ • Au = -Vp, V • u = 0, (9) 

where u is a vector with components n^, a = 1,2. As a — > 0, these equations formally 
converge to the Euler equations. We consider a periodic domain, and expand the Ua 
in Fourier series: Ua = ^«,ke''''^i where x = (xi,X2),k = {ki,k2), and • denotes an 
inner product. Substitution into equation (^) yields the following equations of motion 
for the Fourier coefficients: 

^A(k)ua,]i = PQ,/3(k)(A;;!y^(k)n^,k-k''U/3,k' + k'pA(k - k')u^,k'it/3,k-k') (10) 

where PQ/3(k) = 6a/3 — with (5=Kronecker delta, k'^ = A;f + /c^, is the Fourier 

space projection on the space of divergence-free vectors {kiui^k + k2U2^k = 0,), (3, 7 are 
component indices and A(k) = (1 + a^k'^Y is the Fourier transform of the operator 
A defined above. (This is the straightforward Fourier series of the right-hand side of 
the projection form of equation ^, with the zero-divergence condition built-in and the 
pressure eliminated). 

Equation (^) conserves an energy, (u, ^u), where the inner product is the standard 
L2 inner product, and an "enstrophy", {A^^A^), where S, is the vorticity ^ = V x u. 
Each of these invariants, as well as any of their linear combinations with positive co- 
efficients, gives rise to an invariant measure with density of the form exp(— C/T), 
where C is a suitable linear combination. However, one can see from general con- 
siderations ([^]) that the energy is irrelevant: if there is no enstrophy in the expres- 
sion for the measure the resulting measure is not ergodic, while if the enstrophy is 
present the energy makes little difference. Thus we consider a measure with density 
exp(— (j4^, A^)/T), in Fourier variables; (a detailed example of such a construction 
is given in |^). The measure is carried by divergence- free vectors (all vectors whose 
divergence is not zero have probability 0), and up to the normalizing factor Z has the 
density 

exp(-^A;2(l + a2A;2)2«|uk|VT), (11) 

(For simplicity, we are assuming for the rest of the paper that s = 1 in the operator 
A). This expression shows why the negative temperatures of the usual Euler equations 
do not appear here: they are necessary in the Euler case to keep the enstrophy finite 
Q; here, for large k, Uk ~ k""^, which makes the enstrophy ^ /c^|up finite even when 
the temperature T is positive. 
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With this measure, one can see by a simple calculation that if all velocities are 
divergence-free, then 

^Kw] = fe2(i + ^2fc2)2 • (12) 

We want to model the evolution of a small number of Fourier coefficients, those that 
satisfy |k|oo = max(|A;i|, \k2\) < m, which we shall call the "resolved modes". We will 
refer to the remaining modes as "sampled modes" . We rewrite the evolution equations 
for the resolved modes as sums of terms that depend only on the resolved modes plus 
terms that also involve the sampled modes. For this purpose we take advantage of 
the fact that the right hand side of equation (^) is quadratic in u so that its Fourier 
transform is a convolution of the form 

^(/)Q,/3(k, k')ua^k'^i/3,k-k', (13) 

where the function (j)a/3 consists of expressions that guarantee incompressibility and 
perform the several differentiations; it depends only on the wave numbers but not on 
the amplitudes of the Fourier coefficients. The terms in the evolution of the resolved 
modes can be divided into three groups: 

1. Those where both |k — k'|oo < and |k'|oo < rn (i.e, both are in the resolved 
range); we denote their sum by G^. 

2. Those where one factor belongs to the resolved range and one does not; the 
structure of the convolution is such that the factor in the sampled range has 
a wave number such that |k|oo < 2m. In other words, resolved modes cannot 
interact with sampled modes further away. If we can model this subset of the 
sampled range, we have all the input we need to follow the dynamics of the 
resolved range. We call their sum G^. 

3. Those where neither factor belongs to the resolved range; their sum is G^. 
Thus the Fourier-space evolution equation (pX|) takes the form: 

— Ma,k = G^,k + G'a^k + ■ (14) 

In first-order optimal prediction G^ + G"^ would be approximated by its conditional 
expectation given the values of the Fourier components in the resolved range. The 
Averaged Euler equations share with the usual Euler equations the remarkable prop- 
erty that when the measure is Gaussian and based on energy and/or enstrophy, this 
conditional expectation is zero provided uq = 0. If uq 7^ the conditional expectation 
takes a simple form, that we shall not specify here because it will not be needed. 

5 Monte Carlo simulations 

The system of equations ([lO| ) is simple enough so that we can find its mean solution 
when we have partial data and the remaining data are drawn from a given distribution. 
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by sampling the distribution NensemUe times, evolving the system in time, and averag- 
ing. The system is also simple enough that we can determine the statistical properties 
of the modes, in particular their time covariances. The evolution equations are evolved 
with a fourth-order Runge-Kutta ODE solver with adaptive time-step control. A typ- 
ical simulation requires 10^ - 10^ runs for reasonable accuracy because Monte Carlo 
methods exhibit errors proportional to l/\/Nensemble- 

Our Monte Carlo simulation lead us to the following observations. First, the sim- 
ulation results are only weakly dependent on the size of the sampled region as long 
as it includes {k| |k|oo < 2m}. The evidence for this observation will be presented 
elsewhere. 

In the rest of this paper the time correlation function 

C(a,ki,ii;^,k2,t2) =< (^iQ,ki(ii) - iia,ki (ii))*('U/3,k2 (^2) - '"/3,k2(*2)) > (15) 

of the sampled modes plays an important role. Our numerical simulations lead us to 
the following observations about the correlation function: At time t = 0, when the 
probability distribution is known, the various Fourier components are independent. 
We observe that, to a very good approximation, this remains true for t > 0, and the 
time-correlation function is diagonal in k. We may therefore restrict consideration to 
the autocorrelation functions of the sampled modes. The spatial index structure of the 
autocorrelation is determined by the requirement that the velocity fields be divergence 
free. We make two further observations about the correlation functions: 

• The autocorrelation functions are well approximated in time as Gaussians whose 
width is a function of k. 

• The peak height of the Gaussian is well approximated by the magnitude of the 
correlation function in the invariant measure Certainly this must be true 
asymptotically as t ^ cxd. Our numerical calculations indicate that this is ap- 
proximately true at all times. 

These observations allow us to describe the correlation functions with a single param- 
eter cr(k): 

Cia,k,h;(3,p,t2) ~ 4,p ^2(i7a2fc2)2 ^"'^ e(*i-*2)V.(k)2 (16) 

In general, cr(k) will be a function of the direction and magnitude of the momentum 
vector k, time t, as well as the initial value of the resolved components. However, in 
practical application, it will be advantageous to approximate cr(k) by a function of the 
magnitude of k only. A reasonable approach to this would be to Monte Carlo simulate 
the correlation function C with all Ua,iL chosen from the invariant measure (i.e. no 
resolved modes). This produces the correlation functions of the invariant measure, 
which can only be a function of |k|. Figure |l| compares a"(k) at equilibrium, i.e., 
with all the modes sampled, to o"(k) calculated when the initial measure has specified 
resolved modes in the wave-number region [—5,5] x [—5,5]. 

Having determined the autocorrelation functions in the prior distribution, we shall 
now use them to approximate specific initial value problems with prescribed partial 
data, thus pushing the general methodology of OP to a higher order in the statistics. 



8 



0.2 



0.05 



• conditional measure 
□ invariant measure 



0.15 - 



"e # 



0, - ,° %^J"Mi 



-2 -1 

e 



Figure 1: Comparison of autocorrelation widths o'(k) in the invariant measure and in a 
specific run with prescribed partial data. Widths are scaled by multiplying by the magnitude 
of k. 

In particular, cr(k) will henceforth be modeled as a constant divided by the magnitude 
of k, where the constant is determined by the invariant measure, as suggested by Figure 

6 Approximation of an underresolved system 
by a stochastic differential equation 



Equation (14) shows that the resolved modes interact with the sampled modes only 
through the terms and . We shall ignore G^ since it is higher order in the sampled 
modes and hence suppressed by factors of In the following, we shall use Ua^k to 

refer only to the resolved modes, and Va,k to refer to the sampled modes. The sampled 
modes may be written as a mean plus a fluctuation: = Vk + (^Vk- Because G^ is 
linear in the sampled modes, we may write it as L(u)(v + (^v). The evolution equation 
for the resolved modes may now be written as 

^Ua,k = G^,k(u) + La,k-/3,pV/3,p + -Z>a,k;/3,p5'y/3,p (17) 

We will view the fluctuating part of the sampled Fourier modes, 5v, not as a 
dynamically evolving function of a random initial configuration, but rather as a random 
variable with specific statistical properties, just like z{t) in section 2. By construction 
6v has zero mean. The results of the previous section require 5vk to be uncorrelated 
with any mode with different k, and that its autocorrelation have the form of equation 



(IE). We shall not specify any higher order statistics. 
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The evolution equation above also involves the mean value v. We have not presented 
any empirical observations about v. However, by analogy to the Langevin equation, 
we can see that the average effect of the v term in the evolution equation is to provide 
a dissipation that counteracts the fluctuating 6v term. We thus write the evolution 
equation as 

— Wa^k = Go - 7a,k;/3,p^i/3,p + -^^a,k;/3,p'5i'/3,p (18) 

We can use the fluctuation-dissipation theorem to estimate the dissipation matrix 7. 
Because the matrix 7 is dominated by the diagonal elements, in numerical computations 
we will only use the diagonal element: 

V ^V^^W(q^ • p){A{p)p' - A(q)g^)^|upp 
^'^''^ k=p+q g4^(q)VA;M(k)2 ^ 

where q-^ = {q2,—Qi) and q runs over the sampled modes while p runs over the resolved 
modes. 

The new evolution equation is now a stochastic differential equation driven by 
random inputs c^v. The only Fourier modes that are evolved by the equations are the 
resolved modes; the sampled modes are not evolved. The statistics of the sampled 
modes represent the effect of the unresolved modes. 

The numerical algorithm described in section ^ can be modified for the stochastic 
differential equation. First, only the resolved modes are evolved. The Fourier am- 
plitudes Ua^k in the sampled region are chosen randomly from a population with the 
correct autocorrelation function (|T6|). This correctly models the L(u)5v term in equa- 
tion (0). Third, a dissipation term ( p^ ) is introduced on the right hand side to model 
the 7n term in ([T8|); this extracts from the noise the main part of its mean. Unlike 
what we saw in the Langevin equation of section 3 which has purely additive noise, we 
have no guarantee here that the fluctuation/dissipation formula will take into account 
all of the dissipation. In the evaluation of the dissipation term, we assume that the 
autocorrelation of each mode is a constant times a delta function, with the constant 
equal to the integral of the autocorrelation. What we have done is tranform a problem 
with random initial data into a stochastic differential equation. 

An initial test of the stochastic differential equation approach consists in computing 
a norm of the average solution as a function of time. We choose as norm the A- 
enstrophy of the solution: 

Aenstrophy{u) = ^ k'^{l + a'^k'^f\uk\'^. (20) 
k 

This norm gives an equal weighting to all the modes. The A-enstrophy is, of course, 
conserved in time in each realisation of the system, but not on the average ( see the 
discussion in section 2). 

Figure ^ shows the decay of the A-enstrophy of the average solution. One curve 
shows the true result obtained from many Monte Carlo samples of the full equations. 
The second curve results from the stochastic differential equation. Clearly the decay 
characteristics are accurately modeled by the stochastic differential equation. This 
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Figure 2: Comparison of decay of the mean A-enstrophy. The physical domain is [0,27r] x 
[0,27r]. The resolved region in wave-number space is [—5,5] x [—5,5]. The parameter a in 
the Average Euler equation was taken to be 1. First curve is the true decay as calculated 
by Monte Carlo. Second curve results from approximating system as stochastic differential 
equation. 



is an important improvement over first order optimal prediction whose evolution is 
governed by a renormalized Hamiltonian as noted above, and hence is not dissipative. 



7 Conclusions 

We have used the ideas of optimal prediction to reduce an underresolved problem to a 
stochastic differential equation. This stochastic differential equation has fewer modes 
than the full equation, and requires only partial information about the initial state. On 
the other hand, it does require prior knowledge about the statistics of the solution, as we 
expect in OP methods. The full power of higher-order, stochastic OP will appear when 
we create effective variance-reduction techniques for the stochastic differential equation. 
The real test of the ideas will come when we attempt to solve Euler and Navier-Stokes 
equations in three space dimensions; the outstanding problem is the formulation of 
a reasonable invariant measure in those cases. The OP approach transfers the onus 
of modeling turbulence from trying to guess relations between moments to trying to 
guess the relevant invariant measures (and awaiting a mathematical derivation of such 
measures). We also expect that the OP machinery will find important uses in other 
problems dominated by complexity, for example in molecular dynamics (see e.g.[|l2[). 

Note that Figure § illustrates a major principle that is often overlooked: an un- 
derresolved conservative system behaves on the average as a dissipative system. The 
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importance of this fact for the understanding of turbulence cannot be overstated. 
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